

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!



王伟 韩纪庆 郑铁然 郑贵滨 陶耀

王伟, 韩纪庆, 郑铁然, 郑贵滨, 陶耀. 基于Fisher判别字典学习的说话人识别[J]. 电子与信息学报, 2016, 38(2): 367-372. doi: 10.11999/JEIT 150566
引用本文: 王伟, 韩纪庆, 郑铁然, 郑贵滨, 陶耀. 基于Fisher判别字典学习的说话人识别[J]. 电子与信息学报, 2016, 38(2): 367-372. doi: 10.11999/JEIT 150566
WANG Wei, HAN Jiqing, ZHENG Tieran, ZHENG Guibin, TAO Yao. Speaker Recognition Based on Fisher Discrimination Dictionary Learning[J]. Journal of Electronics & Information Technology, 2016, 38(2): 367-372. doi: 10.11999/JEIT 150566
Citation: WANG Wei, HAN Jiqing, ZHENG Tieran, ZHENG Guibin, TAO Yao. Speaker Recognition Based on Fisher Discrimination Dictionary Learning[J]. Journal of Electronics & Information Technology, 2016, 38(2): 367-372. doi: 10.11999/JEIT 150566


doi: 10.11999/JEIT 150566

国家自然科学基金(61071181, 61471145),国家自然科学基金重大研究计划 (91120303)

Speaker Recognition Based on Fisher Discrimination Dictionary Learning


The National Natural Science Foundation of China (61071181, 61471145), The Major Research Plan of the National Natural Science Foundation of China (91120303)

  • 摘要: 稀疏表示已成功应用于说话人识别领域。在稀疏表示中,构造好的字典起着重要的作用。该文将Fisher准则的结构化字典学习方法引入说话人识别系统。在判别字典的学习过程中,每一个字典对应一个类标签,因此同类别训练样本的重构误差较小。同时,保证训练样本的稀疏编码系数类内误差最小,类间误差最大。在NIST SRE 2003数据库上,实验结果表明该算法得到的等错误率是7.62%,基于余弦距离打分的i-vector的等错误率是6.7%。当两个系统融合后,得到的等错误率是5.07%。
  • CANDS E. Compressive sampling[C]. Proceedings of the 2nd International Congress of Mathematicians, Spain, 2006: 1433-1452.
    CANDS E J, ROMBERG J, and TAO T. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information[J]. IEEE Transactions on Information Theory, 2004, 52(2): 489-509.
    BARANIUK R. Compressive sensing[J]. IEEE Signal Processing Magazine, 2008, 56(4): 4-5.
    丁军, 刘宏伟, 王英华. 基于非负稀疏表示的SAR图像目标识别方法[J]. 电子与信息学报, 2014, 36(9): 2194-2200. doi: 10.3724/SP.J.1146.2013.01451.
    DING Jun, LIU Hongwei, and WANG Yinghua. SAR image target recognition based on non-negative sparse representation[J]. Journal of Electronics Information Technology, 2004, 36(9): 2194-2200. doi: 10.3724/SP.J.1146. 2013.01451.
    苏伍各, 王宏强, 邓彬, 等. 基于稀疏贝叶斯方法的脉间捷变频ISAR成像技术研究[J]. 电子与信息学报,2015, 37(1): 1-8. doi: 10.11999/JEIT.140315.
    SU Wuge, WANG Hongqiang, DENG Bin, et al. The interpulse frequency agility ISAR imaging technology based on sparse bayesian method[J]. Journal of Electronics Information Technology, 2015, 37(1): 1-8. doi: 10.11999/ JEIT.140315.
    HUANG K and AVIYENTE S. Sparse Representation for Signal Classification[M]. New York, MIT Press, 2006: 609-616.
    MALLAT S. A Wavelet Tour of Signal Processing[M]. Second Edition. New York, Academic Press, 1999: 506-513.
    CANDS E J and GUO F. New multiscale transforms, minimum total variation synthesis: Applications to edge-preserving image reconstruction[J]. Signal Processing, 2002, 82(2): 1519-1543.
    GABOR D. Theory of communication. Part 1: the analysis of information[J]. Journal of the Institution of Electrical Engineers-Part III: Radio and Communication Engineering, 1946, 93(26): 429-441.
    AHARON M, ELAD M, and BRUCKSTEIN A. The K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation[J]. IEEE Transactions on Signal Processing, 2006, 54(11): 4311-4322.
    MAIRAL J, BACH F, and PONCE J. Online dictionary learning for sparse coding[C]. Proceedings of the 26th Annual International Conference on Machine Learning, Canada, 2009: 689-696.
    WANG J, LU C, WANG M, et al. Robust face recognition via adaptive sparse representation[J]. IEEE Transactions on Cybernetics, 2014, 44(12): 2368-2378.
    KUA J M K, AMBIKAIRAJAH E, and EPPS J. Speaker verification using sparse representation classification[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Czech Republic, 2011: 4548-4551.
    LI M, ZHANG X, and YAN Y. Speaker verification using sparse representations on total variability i-vectors[C]. 12th Annual Conference of the International Speech Communication Association (Interspeech), Italy, 2011: 2729-2732.
    MAIRAL J, BACH F, and PONCE J. Discriminative learned dictionaries for local image analysis[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, 2008: 1-8.
    ZHANG Q and LI B. Discriminative K-SVD for dictionary learning in face recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, 2010: 2691-2698.
    RAMIREZ I, SPRECHMANN P, and SAPIRO G. Classification and clustering via dictionary learning with structured incoherence and shared features[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, 2010: 3501-3508.
    JIANG Z, LIN Z, and DAVIS L S. Label consistent K-SVD: learning a discriminative dictionary for recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(11): 2651-2664.
    MAIRAL J, PONCE J, and SAPIRO G. Supervised Dictionary Learning[M]. New York, MIT Press, 2009: 1033-1040.
    WANG Z, YANG J, NASRABADI N, et al. Look into sparse representation based classification: A margin-based perspective[C]. IEEE International Conference on Computer Vision (ICCV), Sydney, 2013: 759-769.
    YANG M, ZHANG L, FENG X C, et al. Sparse representation based fisher discrimination dictionary learning for image classification[J]. International Journal of Computer Vision, 2014, 109(3): 209-232.
    RAO W and MAK M W. Boosting the performance of i-vector based speaker verification via utterance partitioning [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(5): 1012-1022.
    LIU T T, KANG Kai, and GUAN S X. I-vector based text-independent speaker identification[C]. 11th World Congress on Intelligent Control and Automation (WCICA), Shenyang, 2014: 5420-5425.
    DEHAK N, KENNY P, and DEHAK R. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19 (4): 788-798.
    DEHAK N, KENNY P, and DEHAK R. Support vector machines and joint factor analysis for speaker verification[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Taiwan, 2009: 4237-4240.
    ROSASCO L, VERRI A, and SANTORO M. Iterative projection methods for structured sparsity regularization[R]. MIT Technical Reports, MIT-CSAIL-TR-2009-050, CBCL-282, 2009.
    GU S, ZHANG L, and ZUO W. Projective Dictionary Pair Learning for Pattern Classification[M]. New York, MIT Press, 2014: 793-801.
    KENNY P, STAFYLAKIS T, and OUELLET P. PLDA for speaker verification with utterances of arbitrary duration[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, 2013: 7649-7653.
    HARIS B C and SINHA R. Sparse representation over learned and discriminatively learned dictionaries for speaker verification[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, 2012: 4785-4788.
    STAFYLAKIS T, KENNY P, and SENOUSSAOUI M. PLDA using gaussian restricted boltzmann machines with application to speaker verification[C]. 13th Annual Conference of the International Speech Communication Association (Interspeech), Portland, 2012: 1692-1695.
    KINNUNEN T and LI H. An overview of text-independent speaker recognition: from features to supervectors[J]. Speech Communication, 2010, 52(1): 12-40.
    KANAGASUNDARAM A, DEAN D, SRIDHARAN S, et al. I-vector based speaker recognition using advanced channel compensation techniques[J]. Computer Speech Language, 2014, 28(1): 121-140.
  • 加载中
  • 文章访问数:  1107
  • HTML全文浏览量:  127
  • PDF下载量:  775
  • 被引次数: 0
  • 收稿日期:  2015-05-13
  • 修回日期:  2015-09-06
  • 刊出日期:  2016-02-19


